Scalable gastroscopic video summarization via similar-inhibition dictionary selection
نویسندگان
چکیده
OBJECTIVE This paper aims at developing an automated gastroscopic video summarization algorithm to assist clinicians to more effectively go through the abnormal contents of the video. METHODS AND MATERIALS To select the most representative frames from the original video sequence, we formulate the problem of gastroscopic video summarization as a dictionary selection issue. Different from the traditional dictionary selection methods, which take into account only the number and reconstruction ability of selected key frames, our model introduces the similar-inhibition constraint to reinforce the diversity of selected key frames. We calculate the attention cost by merging both gaze and content change into a prior cue to help select the frames with more high-level semantic information. Moreover, we adopt an image quality evaluation process to eliminate the interference of the poor quality images and a segmentation process to reduce the computational complexity. RESULTS For experiments, we build a new gastroscopic video dataset captured from 30 volunteers with more than 400k images and compare our method with the state-of-the-arts using the content consistency, index consistency and content-index consistency with the ground truth. Compared with all competitors, our method obtains the best results in 23 of 30 videos evaluated based on content consistency, 24 of 30 videos evaluated based on index consistency and all videos evaluated based on content-index consistency. CONCLUSIONS For gastroscopic video summarization, we propose an automated annotation method via similar-inhibition dictionary selection. Our model can achieve better performance compared with other state-of-the-art models and supplies more suitable key frames for diagnosis. The developed algorithm can be automatically adapted to various real applications, such as the training of young clinicians, computer-aided diagnosis or medical report generation.
منابع مشابه
Context-Aware Video Summarization
We present a method that is able to find the most informative video portions, leading to a summarization of video sequences. In contrast to the existing works, our method is able to capture the important video portions through information about individual local motion regions, as well as the interactions between these motion regions. Specifically, our proposed Context-Aware Video Summarization ...
متن کاملUnsupervised Object-Level Video Summarization with Online Motion Auto-Encoder
Unsupervised video summarization plays an important role on digesting, browsing, and searching the ever-growing videos everyday. Despite the great progress achieved by prior works (e.g., the frame-level video summarization), the underlying fine-grained semantic and motion information (i.e., objects of interest and their key motions) in online videos has been barely touched, which is more essent...
متن کاملAn integrated approach to summarization and adaptation using H.264/MPEG-4 SVC
The huge amount of multimedia content and the variety of terminals and networks make video summarization and video adaptation two key technologies to provide e ective access and browsing. With scalable video coding, the adaptation of video to heterogeneous terminals and networks can be e ciently achieved using together a layered coding hierarchy and bitstream extraction. On the other hand, many...
متن کاملSparse Dictionary-based Attributes for Action Recognition and Summarization
We present an approach for dictionary learning of action attributes via information maximization. We unify the class distribution and appearance information into an objective function for learning a sparse dictionary of action attributes. The objective function maximizes the mutual information between what has been learned and what remains to be learned in terms of appearance information and cl...
متن کامل10-701 Machine Learning Final Project Report: Video Summarization via Deep Convolutional Networks
As the demand of video summarization techniques increases nowadays, many methods are proposed for how to extract best representating key frames of a video. While most of them rely on hand-crafted image features, we resort to the feature learning power of deep convolutional networks. In this final project, we propose to learn a new image representation such that the similarity of frames are spec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Artificial intelligence in medicine
دوره 66 شماره
صفحات -
تاریخ انتشار 2016